Skip to content

Conversation

@roomote
Copy link
Collaborator

@roomote roomote commented Jun 30, 2025

Summary

This PR fixes issue #5247 where codebase indexing was performing incorrect code parsing, creating arbitrary code fragments instead of complete function and class definitions.

Problem

The tree-sitter parser was generating chunks containing only 3-4 lines of random code (like try-catch blocks or simple assignments) rather than complete functions or classes. This was happening because the parser logic was incorrectly processing tree-sitter captures.

Solution

Fixed the implementation in to:

  • Properly use tree-sitter captures to identify complete function and class definitions
  • Process captures to find definition boundaries and create chunks for entire functions/classes
  • Add logic to avoid duplicate processing of the same definition nodes
  • Maintain fallback chunking only when no valid definitions are found

Key Changes

  • Modified the capture processing logic (lines 125-200) to correctly identify definition nodes from captures
  • Added proper handling for both direct definition captures (, ) and name captures (, etc.)
  • Ensured chunks contain complete code definitions rather than arbitrary fragments
  • Added deduplication logic to prevent processing the same definition multiple times

Testing

  • Verified the fix works correctly with Python code containing functions and classes
  • Confirmed that complete definitions are now captured as single chunks
  • All existing tests pass and type checks are clean

Fixes #5247


Important

Fixes code parsing in parser.ts to correctly identify and process complete function/class definitions, adding deduplication and fallback mechanisms.

  • Behavior:
    • Fixes code parsing to correctly identify and process complete function/class definitions in parser.ts.
    • Adds deduplication logic to avoid processing the same definition multiple times.
    • Maintains fallback chunking when no valid definitions are found.
  • Implementation:
    • Modifies capture processing logic (lines 125-200) to identify definition nodes from captures.
    • Handles both direct definition captures and name captures.
    • Ensures chunks contain complete code definitions rather than arbitrary fragments.
  • Testing:
    • Verified with Python code containing functions and classes.
    • Confirmed complete definitions are captured as single chunks.
    • All existing tests pass and type checks are clean.

This description was created by Ellipsis for 843e2d9. You can customize this summary. It will automatically update as commits are pushed.

… level chunks

- Fixed CodeParser to properly use tree-sitter captures for identifying complete function and class definitions
- Previously was creating arbitrary code fragments instead of complete definitions
- Now correctly processes captures to identify definition boundaries and creates chunks for entire functions/classes
- Added logic to avoid duplicate processing of the same definition nodes
- Maintains fallback chunking only when no valid definitions are found
@roomote roomote requested review from cte, jr and mrubens as code owners June 30, 2025 16:57
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jun 30, 2025
@delve-auditor
Copy link

delve-auditor bot commented Jun 30, 2025

No security or compliance issues detected. Reviewed everything up to 843e2d9.

Security Overview
  • 🔎 Scanned files: 1 changed file(s)
Detected Code Changes

The diff is too large to display a summary of code changes.

Reply to this PR with @delve-auditor followed by a description of what change you want and we'll auto-submit a change to this PR to implement it.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jun 30, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 7, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

The codebase indexing is performing incorrect code parsing.

3 participants